Finding the Storyteller: Automatic Spoiler Tagging using Linguistic Cues
نویسندگان
چکیده
Given a movie comment, does it contain a spoiler? A spoiler is a comment that, when disclosed, would ruin a surprise or reveal an important plot detail. We study automatic methods to detect comments and reviews that contain spoilers and apply them to reviews from the IMDB (Internet Movie Database) website. We develop topic models, based on Latent Dirichlet Allocation (LDA), but using linguistic dependency information in place of simple features from bag of words (BOW) representations. Experimental results demonstrate the effectiveness of our technique over four movie-comment datasets of different scales.
منابع مشابه
سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملTagging Of Speech Acts And Dialogue Games In Spanish Call Home
The Clarity project is devoted to automatic detection and classification of discourse structures in casual, non-task-oriented conversation using shallow, corpus-based methods of analysis. For the Clarity project, we have tagged speech acts and dialogue games in the Call Home Spanish corpus. We have done preliminary cross-level experiments on the relationship of word and speech act n-grams to di...
متن کاملAutomatic Refinement of Linguistic Rules for Tagging
This paper describes an approach to POS tagging based on the automatic refinement of manually written linguistic tagging rules. The refinement was carried out by means of a learning algorithm based on decision trees. The tagging rules work on ambiguity classes: each input word undergoes a morphological analysis and a set of possible tags is returned. The set of tags determines the ambiguity cla...
متن کاملAutomatic Tracking of Obsolescent Segments with Linguistic Cues
This paper deals with the description and the automatic tracking of text segments containing obsolescence in encyclopedia texts. We assume that despite the non-linguistic nature of this phenomenon, discursive cues are relevant to track those segments. For that purpose, we have worked on a corpus which has been manually annotated by experts and on which we have projected automatically tracked cu...
متن کاملPart-of-Speech Tagging Without Training
The development of the Internet and the World Wide Web can be either a threat to the survival of indigenous languages or an opportunity for their development. The choice between cultural diversity and linguistic uniformity is in our hands and the outcome depends on our capability to devise, design and use tools and techniques for the processing of natural languages. Unfortunately natural langua...
متن کامل